{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What is Datashader?\n",
    "\n",
    "**Datashader turns even the largest datasets into images, faithfully preserving the data's distribution.**\n",
    "\n",
    "Datashader is an [open-source](https://github.com/bokeh/datashader/) Python 2 and 3 library for analyzing and visualizing large datasets. Specifically, Datashader is designed to \"rasterize\" or \"aggregate\" datasets into regular grids that can be viewed as images, making it simple and quick to see the properties and patterns of your data. Datashader can plot a billion points in a second or so on a 16GB laptop, and scales up easily to out-of-core,  distributed, or GPU processing for even larger datasets.\n",
    "\n",
    "This page of the getting-started guide will give a simple example to show how it works, and the following page will show how to use Datashader as a standalone library for generating arrays or images directly\n",
    "([2-Pipeline](2-Pipeline.ipynb)).  Next we'll show how to use Datashader as a component in a larger visualization system like [HoloViews](http://holoviews.org) or [Bokeh](http://bokeh.pydata.org) that provides interactive plots with dynamic zooming, labeled axes, and overlays and layouts ([3-Interactivity](3-Interactivity.ipynb)).  More detailed information about each topic is then provided in the [User Guide](../user_guide/).\n",
    "\n",
    "## Example: NYC taxi trips\n",
    "\n",
    "To illustrate how this process works, we will demonstrate some of the key features of Datashader using a standard \"big-data\" example: millions of taxi trips from New York City, USA.  First let's import the libraries we are going to use and then read the dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import datashader as ds\n",
    "import pandas as pd\n",
    "from colorcet import fire\n",
    "from datashader import transfer_functions as tf\n",
    "\n",
    "df = pd.read_csv('../data/nyc_taxi.csv', usecols=['dropoff_x', 'dropoff_y'])\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here you can see that we have a variety of columns with data about each of the 10 million taxi trips here, such as the locations in Web Mercator coordinates, the distance, etc.  With datashader, we can choose what we want to plot on the `x` and `y` axes and see the full data immediately, with no parameter tweaking, magic numbers, subsampling, or approximation, up to the resolution of the display:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "agg = ds.Canvas().points(df, 'dropoff_x', 'dropoff_y')\n",
    "tf.set_background(tf.shade(agg, cmap=fire),\"black\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here you can immediately see that the data points are aligned to a street grid, that some areas have much more traffic than others, and that the quality of the signal varies spatially (with some areas having blurry patterns that indicate GPS errors, perhaps due to tall buildings). Getting a plot like this with other approaches would take quite a bit of time and effort, but with Datashader it appears in milliseconds without trial and error.\n",
    "\n",
    "The output above is just a bare image, which is all that Datashader knows how to generate directly.  But Datashader can integrate closely with Bokeh, HoloViews, and GeoViews, which makes it simple to allow interactive zooming, axis labeling, overlays and layouts, and complex web apps.  For example, making a zoomable interactive overlay on a geographic map requires just a few more lines of code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import holoviews as hv\n",
    "from holoviews.element.tiles import EsriImagery\n",
    "from holoviews.operation.datashader import datashade\n",
    "hv.extension('bokeh')\n",
    "\n",
    "map_tiles  = EsriImagery().opts(alpha=0.5, width=900, height=480, bgcolor='black')\n",
    "points     = hv.Points(df, ['dropoff_x', 'dropoff_y'])\n",
    "taxi_trips = datashade(points, x_sampling=1, y_sampling=1, cmap=fire, width=900, height=480)\n",
    "\n",
    "map_tiles * taxi_trips"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can select the \"Wheel Zoom\" tool on the right and then do panning and zooming (with the scroll bar).  The maps will update as you zoom, but the datashaded image will only update if you have a live Python process running. If you do have Python \"live\", each time you zoom in, the data will be re-aggregated at the new zoom level, converted to an image, and displayed embedded on the map data, making it simple to explore and understand the data.\n",
    "\n",
    "At the most basic level, Datashader can accept scatterplot points (as above), line segments (for time series, trajectories, and area plots), rasters (already gridded data to be regridded), or trimeshes (irregularly gridded data), and can turn each of these into a regularly sampled array or the corresponding pixel-based image.  The rest of this getting-started guide shows how to go from your data to either images or interactive plots, as simply as possible. The next [getting-started section](2_Pipeline.ipynb) breaks down each of the steps taken by Datashader, using a synthetic dataset so that you can see precisely how the data relates to the images. The [user guide](../user_guide/) then explains each of the steps in much more detail."
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
